HEAD
We will first load the dataframe with shapefile of chlamydia incidence and income from previous lab.
library (sf)
library(here)
library (spdep)
library (ggplot2)
library (spatialreg)
library (INLA)
load (here("data", "Data_Areal.RData") )
st_drop_geometry(dat.areal)[1:4,]
## ID FIPS Area State Population Cases Income inc
## 1 alabama,autauga 01001 Autauga County AL 49960 184 42013 37
## 2 alabama,baldwin 01003 Baldwin County AL 171769 416 40250 24
## 3 alabama,barbour 01005 Barbour County AL 27941 165 25101 59
## 4 alabama,bibb 01007 Bibb County AL 21535 61 31420 28
CAR models can be fitted using the spautolm function in library spatialreg. The function takes a regression formula similar to lm, with a spatial weight matrix defined as an nb2listw object. We will examine both first-order and second-order (neighbors of 1-first order neighbors).
dat.areal_proj = st_transform(dat.areal, crs = "ESRI:102004")
nb = poly2nb (dat.areal_proj)
nb = nblag (nb, 2) #Create 2nd order and this makes nb into a list
#2nd-order has about twice as many links
nb[[1]]
## Neighbour list object:
## Number of regions: 226
## Number of nonzero links: 1258
## Percentage nonzero weights: 2.462996
## Average number of links: 5.566372
nb[[2]]
## Neighbour list object:
## Number of regions: 226
## Number of nonzero links: 2534
## Percentage nonzero weights: 4.961234
## Average number of links: 11.21239
#Create adjacency matrix:
#Style B = binary for edges
W = nb2listw(nb[[1]], style="B")
W2 = nb2listw(nb[[2]], style="B")
## Standard linear regression
Y = dat.areal$inc
X = (dat.areal$Income-mean (dat.areal$Income))/1000 #center and scale to per $1,000
fit0 = lm (Y~X)
##Fit CAR model
fit1.car = spautolm (Y~X, family = "CAR", listw = W)
fit2.car = spautolm (Y~X, family = "CAR", listw = W2)
The summary from model fit gives similar outputs as with linear regression. The new parameter lamda corresponds to the CAR parameter.
summary (fit1.car)
##
## Call: spautolm(formula = Y ~ X, listw = W, family = "CAR")
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.2595 -13.2726 -4.3352 9.3328 72.2607
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 41.90003 3.54660 11.8142 < 2.2e-16
## X -1.32857 0.21805 -6.0931 1.107e-09
##
## Lambda: 0.15241 LR test value: 46.182 p-value: 1.0778e-11
## Numerical Hessian standard error of lambda: 0.01002
##
## Log likelihood: -1018.587
## ML residual variance (sigma squared): 430.96, (sigma: 20.76)
## Number of observations: 226
## Number of parameters estimated: 4
## AIC: NA (not available for weighted model)
Instead of working with a continous rate (assumed Normal), next we will model the case count directly using Poisson regression. We will compare these to a naive analysis where we don’t borrow information around neighboring counties.
Y = dat.areal$Case
P = dat.areal$Population
X = (dat.areal$Income-mean (dat.areal$Income))/1000 #center and scale to per $1,000
#First define adjaceny matrix
nb = poly2nb (dat.areal_proj)
W = nb2mat (nb, style = "B") #B = binary
The maximum likelihood estimate for the incidence rate is given by \(Y_i/P_i\). We also calculate the standard error associated with \(log (Y_i/P_i)\), which we will compare to estimates from spatial models.
#MLE Estimates
u.mle = Y/P
se.mle = sqrt(Y)/P
We will use INLA here to perform Bayesian estimation. INLA is an efficient approach to conduct Bayesian analysis with hierarchical models when the latent variables (random effects) are Gaussian. Here we will assume the random effects are either iid (exchangeable) or follow have a spatial structure (i.e., besag or propor Leroux). Specific parametrization of these random effect distributions and priors for hyperparameters can be found here (https://inla.r-inla-download.org/r-inla.org/doc/latent/).
#Some data wrangling first
dat.areal$areal_ID = 1:nrow (dat.areal) #Create a county ID from 1 to 226
#INLA needs a specific summary of adjacency structure
nb2INLA ("adj.txt", nb) #write out a file
G <- inla.read.graph(filename = "adj.txt") #read in to get INLA's graph format
#Fit exchangeable (random effect model)
#Note: "E" here is the offset and it does not need to be log-transofrmed
fit.exch = inla (Cases~1+f(areal_ID), E= Population, family = "poisson", data = dat.areal,
control.compute = list(dic = TRUE, waic = TRUE, return.marginals.predictor=TRUE))
#Fit improper CAR model (besag)
fit.iCAR = inla (Cases~1+f(areal_ID, model = "besag", graph = G), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))
#Fit proper CAR model (Leroux)
fit.pCAR = inla (Cases~1+f(areal_ID, model = "besagproper2", graph = G), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))
#Fit BYM model (besag + idd)
#Need to create another set of county IDs because we have two random effect distributions
dat.areal$areal_ID_2 = 1:nrow (dat.areal) #Create a county ID from 1 to 226
fit.conv = inla (Cases~1+f(areal_ID, model = "besag", graph = G) +
f(areal_ID_2), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))
#We can use summary to get a lot of model information.
summary (fit.conv)
##
## Call:
## c("inla.core(formula = formula, family = family, contrasts = contrasts,
## ", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
## scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
## ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
## verbose, ", " lincomb = lincomb, selection = selection, control.compute
## = control.compute, ", " control.predictor = control.predictor,
## control.family = control.family, ", " control.inla = control.inla,
## control.fixed = control.fixed, ", " control.mode = control.mode,
## control.expert = control.expert, ", " control.hazard = control.hazard,
## control.lincomb = control.lincomb, ", " control.update =
## control.update, control.lp.scale = control.lp.scale, ", "
## control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
## ", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
## num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
## working.directory = working.directory, ", " silent = silent, inla.mode
## = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
## .parent.frame)")
## Time used:
## Pre = 0.371, Running = 0.681, Post = 0.021, Total = 1.07
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -5.569 0.023 -5.616 -5.569 -5.523 NA 0
##
## Random effects:
## Name Model
## areal_ID Besags ICAR model
## areal_ID_2 IID model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for areal_ID 1.77 0.478 0.949 1.73 2.81 NA
## Precision for areal_ID_2 9.87 3.194 5.428 9.27 17.81 NA
##
## Deviance Information Criterion (DIC) ...............: 1921.25
## Deviance Information Criterion (DIC, saturated) ....: -600.02
## Effective number of parameters .....................: 213.76
##
## Watanabe-Akaike information criterion (WAIC) ...: 1863.59
## Effective number of parameters .................: 112.78
##
## Marginal log-Likelihood: -1429.88
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
#Get the fixed effects out
fixed.eff = rbind (fit.exch$summary.fixed, fit.iCAR$summary.fixed, fit.pCAR$summary.fixed, fit.conv$summary.fixed)
fixed.eff = data.frame (Model = c("Exch", "iCAR", "pCAR", "Conv"), round(fixed.eff, 2))
fixed.eff
## Model mean sd X0.025quant X0.5quant X0.975quant mode kld
## (Intercept) Exch -5.57 0.04 -5.66 -5.57 -5.49 NA 0
## (Intercept)1 iCAR -5.57 0.01 -5.59 -5.57 -5.55 NA 0
## (Intercept)2 pCAR -5.57 0.21 -5.99 -5.57 -5.15 NA 0
## (Intercept)3 Conv -5.57 0.02 -5.62 -5.57 -5.52 NA 0
#INLA reports precision = 1/variance. Let's convert them using the estimated posterior marginal distribution
#First create a custom function
prec_to_var = function (marg){ data.frame(inla.zmarginal(inla.tmarginal(function(x){(1/x)},marg), silent=TRUE))}
var.est = rbind ( prec_to_var(fit.exch$marginals.hyperpar[[1]]),
prec_to_var(fit.iCAR$marginals.hyperpar[[1]]),
prec_to_var(fit.pCAR$marginals.hyperpar[[1]]),
prec_to_var(fit.conv$marginals.hyperpar[[1]]),
prec_to_var(fit.conv$marginals.hyperpar[[2]]))
var.est = data.frame (Model = c("Exch", "iCAR", "pCAR", "Conv", "Conv"),
Type = c("Sigma2", "Tau2", "Tau2", "Tau2", "Sigma2"), round(var.est, 2))
var.est
## Model Type mean sd quant0.025 quant0.25 quant0.5 quant0.75 quant0.975
## 1 Exch Sigma2 0.42 0.04 0.35 0.39 0.42 0.45 0.51
## 2 iCAR Tau2 1.21 0.12 1.00 1.13 1.20 1.29 1.47
## 3 pCAR Tau2 1.09 0.14 0.86 1.00 1.08 1.18 1.40
## 4 Conv Tau2 0.61 0.18 0.36 0.48 0.58 0.70 1.05
## 5 Conv Sigma2 0.11 0.03 0.06 0.09 0.11 0.13 0.18
Let’s compared the estimated relative risks across models. We first collect estimates and their error from INLA models (scaled to per 10,000 people).
## Extract estimates from each INLA model and scale to per 10,000 population
dat.areal$rr.mle = u.mle*10000
dat.areal$se.mle = se.mle*10000
dat.areal$rr.exch = fit.exch$summary.fitted.values$mean*10000
dat.areal$se.exch = fit.exch$summary.fitted.values$sd*10000
dat.areal$rr.iCAR = fit.iCAR$summary.fitted.values$mean*10000
dat.areal$se.iCAR = fit.iCAR$summary.fitted.values$sd*10000
dat.areal$rr.conv = fit.conv$summary.fitted.values$mean*10000
dat.areal$se.conv = fit.conv$summary.fitted.values$sd*10000
We see that the point estimates are pretty similar across models and they capture similar spatial patterns.
line01 <- function(x,y,...){abline(0,1, col = 4,lwd=2); points(x,y,col=2)}
plot(data.frame(MLE = dat.areal$rr.mle, Exchangeable = dat.areal$rr.exch,
iCAR = dat.areal$rr.iCAR, Convolution = dat.areal$rr.conv), panel=line01)
plot.dat = data.frame (FIPS = dat.areal$FIPS,
type =rep(c("MLE", "Exchangeable", "iCAR", "Convolution"), each = nrow (dat.areal)),
rr = c(as.matrix(st_drop_geometry(dat.areal[,c("rr.mle", "rr.exch", "rr.iCAR", "rr.conv")]))))
plot.dat$type = factor(plot.dat$type, levels = c("MLE", "Exchangeable", "iCAR", "Convolution"))
plot.dat = merge (dat.areal, plot.dat, by = "FIPS")
ggplot () + geom_sf (data = plot.dat, aes (fill = rr)) + facet_wrap(~type)+
scale_fill_gradient2(low = "white",high = "red", name = "Rate")
Let’s just compare the convoluation model and the naive estimates. We see that the larger differences in county-specific relative risks are observed in counties with less cases and smaller population. The difference in precision (measured by the standard error of relative estimates) also depends on population and case number.
par (mfrow=c(1,2))
plot ( sqrt (Y), dat.areal$rr.mle - dat.areal$rr.conv,
ylab = "Differnece in RR", xlab = "SQRT (1/Cases)"); abline (0,0, col = "blue")
plot ( log(P), dat.areal$rr.mle - dat.areal$rr.conv,
ylab = "Differnece in RR", xlab = "Ln (Population)"); abline (0,0, col = "blue")
par (mfrow=c(1,2))
plot ( sqrt (Y), dat.areal$se.mle - dat.areal$se.conv,
ylab = "Differnece in SE(RR)", xlab = "SQRT (1/Cases)"); abline (0,0, col = "blue")
plot ( log(P), dat.areal$se.mle - dat.areal$se.conv,
ylab = "Differnece in SE(RR)", xlab = "Ln (Population)"); abline (0,0, col = "blue")
Finally, let’s add household meduab income as a spatial covariate. We see that the spatial model gives a stronger negative association between income and chlamydia rates. The standard error is also much larger compared to a standard Poisson regression.
dat.areal$Income_center = (dat.areal$Income-mean (dat.areal$Income))/1000
fit = inla (Cases~Income_center+f(areal_ID, model = "besag", graph = G) +
f(areal_ID_2), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))
summary (fit)
##
## Call:
## c("inla.core(formula = formula, family = family, contrasts = contrasts,
## ", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
## scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
## ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
## verbose, ", " lincomb = lincomb, selection = selection, control.compute
## = control.compute, ", " control.predictor = control.predictor,
## control.family = control.family, ", " control.inla = control.inla,
## control.fixed = control.fixed, ", " control.mode = control.mode,
## control.expert = control.expert, ", " control.hazard = control.hazard,
## control.lincomb = control.lincomb, ", " control.update =
## control.update, control.lp.scale = control.lp.scale, ", "
## control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
## ", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
## num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
## working.directory = working.directory, ", " silent = silent, inla.mode
## = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
## .parent.frame)")
## Time used:
## Pre = 0.402, Running = 1.1, Post = 0.0892, Total = 1.59
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -5.564 0.016 -5.597 -5.564 -5.531 NA 0
## Income_center -0.034 0.005 -0.044 -0.034 -0.024 NA 0
##
## Random effects:
## Name Model
## areal_ID Besags ICAR model
## areal_ID_2 IID model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for areal_ID 1.48 0.225 1.02 1.46 1.92 NA
## Precision for areal_ID_2 23.45 9.370 11.69 21.94 49.54 NA
##
## Deviance Information Criterion (DIC) ...............: 1919.62
## Deviance Information Criterion (DIC, saturated) ....: -601.65
## Effective number of parameters .....................: 211.08
##
## Watanabe-Akaike information criterion (WAIC) ...: 1864.94
## Effective number of parameters .................: 113.08
##
## Marginal log-Likelihood: -1416.25
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
prec_to_var(fit$marginals.hyperpar[[1]])
## mean sd quant0.025 quant0.25 quant0.5 quant0.75 quant0.975
## 1 0.6938514 0.1133778 0.5252033 0.6115126 0.6742112 0.7570979 0.9648019
prec_to_var(fit$marginals.hyperpar[[2]])
## mean sd quant0.025 quant0.25 quant0.5 quant0.75 quant0.975
## 1 0.04837744 0.01610103 0.02108057 0.03640378 0.04725156 0.05884403 0.08283049
##Compare to other GLM
dat.areal$logPop = log(dat.areal$Population)
summary (glm (Cases~offset (logPop) + Income_center, family = "poisson", data = dat.areal))
##
## Call:
## glm(formula = Cases ~ offset(logPop) + Income_center, family = "poisson",
## data = dat.areal)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -19.140 -7.805 -2.782 0.299 42.374
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.1873009 0.0043950 -1180.27 <2e-16 ***
## Income_center -0.0228930 0.0003785 -60.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 21775 on 225 degrees of freedom
## Residual deviance: 17893 on 224 degrees of freedom
## AIC: 19376
##
## Number of Fisher Scoring iterations: 4
summary (glm (Cases~offset (logPop) + Income_center, family = "quasipoisson", data = dat.areal))
##
## Call:
## glm(formula = Cases ~ offset(logPop) + Income_center, family = "quasipoisson",
## data = dat.areal)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -19.140 -7.805 -2.782 0.299 42.374
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.187301 0.038599 -134.390 < 2e-16 ***
## Income_center -0.022893 0.003324 -6.888 5.66e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 77.13112)
##
## Null deviance: 21775 on 225 degrees of freedom
## Residual deviance: 17893 on 224 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
#Examine random effects from the model
dat.areal$gamma_est = fit$summary.random[[1]]$mean
dat.areal$gamma_se = fit$summary.random[[1]]$sd
dat.areal$theta_est = fit$summary.random[[2]]$mean
dat.areal$theta_se = fit$summary.random[[2]]$sd
ggplot () + geom_sf (data = dat.areal, aes (fill = gamma_est))+
scale_fill_gradient2(low = "white",high = "red", name = "Gamma")+ggtitle("Spatial Random Effect Estimates")
ggplot () + geom_sf (data = dat.areal, aes (fill = theta_est))+
scale_fill_gradient2(low = "white",high = "red", name = "Theta")+ggtitle("Independent Random Effect Estimates")
We will first load the dataframe with shapefile of chlamydia incidence and income from previous lab.
library (sf)
library(here)
library (spdep)
library (ggplot2)
library (spatialreg)
library (INLA)
load (here("data", "Data_Areal.RData") )
st_drop_geometry(dat.areal)[1:4,]
## ID FIPS Area State Population Cases Income inc
## 1 alabama,autauga 01001 Autauga County AL 49960 184 42013 37
## 2 alabama,baldwin 01003 Baldwin County AL 171769 416 40250 24
## 3 alabama,barbour 01005 Barbour County AL 27941 165 25101 59
## 4 alabama,bibb 01007 Bibb County AL 21535 61 31420 28
CAR models can be fitted using the spautolm function in library spatialreg. The function takes a regression formula similar to lm, with a spatial weight matrix defined as an nb2listw object. We will examine both first-order and second-order (neighbors of 1-first order neighbors).
dat.areal_proj = st_transform(dat.areal, crs = "ESRI:102004")
nb = poly2nb (dat.areal_proj)
nb = nblag (nb, 2) #Create 2nd order and this makes nb into a list
#2nd-order has about twice as many links
nb[[1]]
## Neighbour list object:
## Number of regions: 226
## Number of nonzero links: 1258
## Percentage nonzero weights: 2.462996
## Average number of links: 5.566372
nb[[2]]
## Neighbour list object:
## Number of regions: 226
## Number of nonzero links: 2534
## Percentage nonzero weights: 4.961234
## Average number of links: 11.21239
#Create adjacency matrix:
#Style B = binary for edges
W = nb2listw(nb[[1]], style="B")
W2 = nb2listw(nb[[2]], style="B")
## Standard linear regression
Y = dat.areal$inc
X = (dat.areal$Income-mean (dat.areal$Income))/1000 #center and scale to per $1,000
fit0 = lm (Y~X)
##Fit CAR model
fit1.car = spautolm (Y~X, family = "CAR", listw = W)
fit2.car = spautolm (Y~X, family = "CAR", listw = W2)
The summary from model fit gives similar outputs as with linear regression. The new parameter lamda corresponds to the CAR parameter.
summary (fit1.car)
##
## Call: spautolm(formula = Y ~ X, listw = W, family = "CAR")
##
## Residuals:
## Min 1Q Median 3Q Max
## -46.2595 -13.2726 -4.3352 9.3328 72.2607
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 41.90003 3.54660 11.8142 < 2.2e-16
## X -1.32857 0.21805 -6.0931 1.107e-09
##
## Lambda: 0.15241 LR test value: 46.182 p-value: 1.0778e-11
## Numerical Hessian standard error of lambda: 0.01002
##
## Log likelihood: -1018.587
## ML residual variance (sigma squared): 430.96, (sigma: 20.76)
## Number of observations: 226
## Number of parameters estimated: 4
## AIC: NA (not available for weighted model)
Instead of working with a continous rate (assumed Normal), next we will model the case count directly using Poisson regression. We will compare these to a naive analysis where we don’t borrow information around neighboring counties.
Y = dat.areal$Case
P = dat.areal$Population
X = (dat.areal$Income-mean (dat.areal$Income))/1000 #center and scale to per $1,000
#First define adjaceny matrix
nb = poly2nb (dat.areal_proj)
W = nb2mat (nb, style = "B") #B = binary
The maximum likelihood estimate for the incidence rate is given by \(Y_i/P_i\). We also calculate the standard error associated with \(log (Y_i/P_i)\), which we will compare to estimates from spatial models.
#MLE Estimates
u.mle = Y/P
se.mle = sqrt(Y)/P
We will use INLA here to perform Bayesian estimation. INLA is an efficient approach to conduct Bayesian analysis with hierarchical models when the latent variables (random effects) are Gaussian. Here we will assume the random effects are either iid (exchangeable) or follow have a spatial structure (i.e., besag or propor Leroux). Specific parametrization of these random effect distributions and priors for hyperparameters can be found here (https://inla.r-inla-download.org/r-inla.org/doc/latent/).
#Some data wrangling first
dat.areal$areal_ID = 1:nrow (dat.areal) #Create a county ID from 1 to 226
#INLA needs a specific summary of adjacency structure
nb2INLA ("adj.txt", nb) #write out a file
G <- inla.read.graph(filename = "adj.txt") #read in to get INLA's graph format
#Fit exchangeable (random effect model)
#Note: "E" here is the offset and it does not need to be log-transofrmed
fit.exch = inla (Cases~1+f(areal_ID), E= Population, family = "poisson", data = dat.areal,
control.compute = list(dic = TRUE, waic = TRUE, return.marginals.predictor=TRUE))
#Fit improper CAR model (besag)
fit.iCAR = inla (Cases~1+f(areal_ID, model = "besag", graph = G), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))
#Fit proper CAR model (Leroux)
fit.pCAR = inla (Cases~1+f(areal_ID, model = "besagproper2", graph = G), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))
#Fit BYM model (besag + idd)
#Need to create another set of county IDs because we have two random effect distributions
dat.areal$areal_ID_2 = 1:nrow (dat.areal) #Create a county ID from 1 to 226
fit.conv = inla (Cases~1+f(areal_ID, model = "besag", graph = G) +
f(areal_ID_2), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))
#We can use summary to get a lot of model information.
summary (fit.conv)
##
## Call:
## c("inla.core(formula = formula, family = family, contrasts = contrasts,
## ", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
## scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
## ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
## verbose, ", " lincomb = lincomb, selection = selection, control.compute
## = control.compute, ", " control.predictor = control.predictor,
## control.family = control.family, ", " control.inla = control.inla,
## control.fixed = control.fixed, ", " control.mode = control.mode,
## control.expert = control.expert, ", " control.hazard = control.hazard,
## control.lincomb = control.lincomb, ", " control.update =
## control.update, control.lp.scale = control.lp.scale, ", "
## control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
## ", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
## num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
## working.directory = working.directory, ", " silent = silent, inla.mode
## = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
## .parent.frame)")
## Time used:
## Pre = 0.371, Running = 0.681, Post = 0.021, Total = 1.07
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -5.569 0.023 -5.616 -5.569 -5.523 NA 0
##
## Random effects:
## Name Model
## areal_ID Besags ICAR model
## areal_ID_2 IID model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for areal_ID 1.77 0.478 0.949 1.73 2.81 NA
## Precision for areal_ID_2 9.87 3.194 5.428 9.27 17.81 NA
##
## Deviance Information Criterion (DIC) ...............: 1921.25
## Deviance Information Criterion (DIC, saturated) ....: -600.02
## Effective number of parameters .....................: 213.76
##
## Watanabe-Akaike information criterion (WAIC) ...: 1863.59
## Effective number of parameters .................: 112.78
##
## Marginal log-Likelihood: -1429.88
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
#Get the fixed effects out
fixed.eff = rbind (fit.exch$summary.fixed, fit.iCAR$summary.fixed, fit.pCAR$summary.fixed, fit.conv$summary.fixed)
fixed.eff = data.frame (Model = c("Exch", "iCAR", "pCAR", "Conv"), round(fixed.eff, 2))
fixed.eff
## Model mean sd X0.025quant X0.5quant X0.975quant mode kld
## (Intercept) Exch -5.57 0.04 -5.66 -5.57 -5.49 NA 0
## (Intercept)1 iCAR -5.57 0.01 -5.59 -5.57 -5.55 NA 0
## (Intercept)2 pCAR -5.57 0.21 -5.99 -5.57 -5.15 NA 0
## (Intercept)3 Conv -5.57 0.02 -5.62 -5.57 -5.52 NA 0
#INLA reports precision = 1/variance. Let's convert them using the estimated posterior marginal distribution
#First create a custom function
prec_to_var = function (marg){ data.frame(inla.zmarginal(inla.tmarginal(function(x){(1/x)},marg), silent=TRUE))}
var.est = rbind ( prec_to_var(fit.exch$marginals.hyperpar[[1]]),
prec_to_var(fit.iCAR$marginals.hyperpar[[1]]),
prec_to_var(fit.pCAR$marginals.hyperpar[[1]]),
prec_to_var(fit.conv$marginals.hyperpar[[1]]),
prec_to_var(fit.conv$marginals.hyperpar[[2]]))
var.est = data.frame (Model = c("Exch", "iCAR", "pCAR", "Conv", "Conv"),
Type = c("Sigma2", "Tau2", "Tau2", "Tau2", "Sigma2"), round(var.est, 2))
var.est
## Model Type mean sd quant0.025 quant0.25 quant0.5 quant0.75 quant0.975
## 1 Exch Sigma2 0.42 0.04 0.35 0.39 0.42 0.45 0.51
## 2 iCAR Tau2 1.21 0.12 1.00 1.13 1.20 1.29 1.47
## 3 pCAR Tau2 1.09 0.14 0.86 1.00 1.08 1.18 1.40
## 4 Conv Tau2 0.61 0.18 0.36 0.48 0.58 0.70 1.05
## 5 Conv Sigma2 0.11 0.03 0.06 0.09 0.11 0.13 0.18
Let’s compared the estimated relative risks across models. We first collect estimates and their error from INLA models (scaled to per 10,000 people).
## Extract estimates from each INLA model and scale to per 10,000 population
dat.areal$rr.mle = u.mle*10000
dat.areal$se.mle = se.mle*10000
dat.areal$rr.exch = fit.exch$summary.fitted.values$mean*10000
dat.areal$se.exch = fit.exch$summary.fitted.values$sd*10000
dat.areal$rr.iCAR = fit.iCAR$summary.fitted.values$mean*10000
dat.areal$se.iCAR = fit.iCAR$summary.fitted.values$sd*10000
dat.areal$rr.conv = fit.conv$summary.fitted.values$mean*10000
dat.areal$se.conv = fit.conv$summary.fitted.values$sd*10000
We see that the point estimates are pretty similar across models and they capture similar spatial patterns.
line01 <- function(x,y,...){abline(0,1, col = 4,lwd=2); points(x,y,col=2)}
plot(data.frame(MLE = dat.areal$rr.mle, Exchangeable = dat.areal$rr.exch,
iCAR = dat.areal$rr.iCAR, Convolution = dat.areal$rr.conv), panel=line01)
plot.dat = data.frame (FIPS = dat.areal$FIPS,
type =rep(c("MLE", "Exchangeable", "iCAR", "Convolution"), each = nrow (dat.areal)),
rr = c(as.matrix(st_drop_geometry(dat.areal[,c("rr.mle", "rr.exch", "rr.iCAR", "rr.conv")]))))
plot.dat$type = factor(plot.dat$type, levels = c("MLE", "Exchangeable", "iCAR", "Convolution"))
plot.dat = merge (dat.areal, plot.dat, by = "FIPS")
ggplot () + geom_sf (data = plot.dat, aes (fill = rr)) + facet_wrap(~type)+
scale_fill_gradient2(low = "white",high = "red", name = "Rate")
Let’s just compare the convoluation model and the naive estimates. We see that the larger differences in county-specific relative risks are observed in counties with less cases and smaller population. The difference in precision (measured by the standard error of relative estimates) also depends on population and case number.
par (mfrow=c(1,2))
plot ( sqrt (Y), dat.areal$rr.mle - dat.areal$rr.conv,
ylab = "Differnece in RR", xlab = "SQRT (1/Cases)"); abline (0,0, col = "blue")
plot ( log(P), dat.areal$rr.mle - dat.areal$rr.conv,
ylab = "Differnece in RR", xlab = "Ln (Population)"); abline (0,0, col = "blue")
par (mfrow=c(1,2))
plot ( sqrt (Y), dat.areal$se.mle - dat.areal$se.conv,
ylab = "Differnece in SE(RR)", xlab = "SQRT (1/Cases)"); abline (0,0, col = "blue")
plot ( log(P), dat.areal$se.mle - dat.areal$se.conv,
ylab = "Differnece in SE(RR)", xlab = "Ln (Population)"); abline (0,0, col = "blue")
Finally, let’s add household meduab income as a spatial covariate. We see that the spatial model gives a stronger negative association between income and chlamydia rates. The standard error is also much larger compared to a standard Poisson regression.
dat.areal$Income_center = (dat.areal$Income-mean (dat.areal$Income))/1000
fit = inla (Cases~Income_center+f(areal_ID, model = "besag", graph = G) +
f(areal_ID_2), E= Population, family = "poisson", data = dat.areal,control.compute = list(dic = TRUE, waic = TRUE))
summary (fit)
##
## Call:
## c("inla.core(formula = formula, family = family, contrasts = contrasts,
## ", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
## scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
## ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
## verbose, ", " lincomb = lincomb, selection = selection, control.compute
## = control.compute, ", " control.predictor = control.predictor,
## control.family = control.family, ", " control.inla = control.inla,
## control.fixed = control.fixed, ", " control.mode = control.mode,
## control.expert = control.expert, ", " control.hazard = control.hazard,
## control.lincomb = control.lincomb, ", " control.update =
## control.update, control.lp.scale = control.lp.scale, ", "
## control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
## ", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
## num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
## working.directory = working.directory, ", " silent = silent, inla.mode
## = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
## .parent.frame)")
## Time used:
## Pre = 0.402, Running = 1.1, Post = 0.0892, Total = 1.59
## Fixed effects:
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -5.564 0.016 -5.597 -5.564 -5.531 NA 0
## Income_center -0.034 0.005 -0.044 -0.034 -0.024 NA 0
##
## Random effects:
## Name Model
## areal_ID Besags ICAR model
## areal_ID_2 IID model
##
## Model hyperparameters:
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for areal_ID 1.48 0.225 1.02 1.46 1.92 NA
## Precision for areal_ID_2 23.45 9.370 11.69 21.94 49.54 NA
##
## Deviance Information Criterion (DIC) ...............: 1919.62
## Deviance Information Criterion (DIC, saturated) ....: -601.65
## Effective number of parameters .....................: 211.08
##
## Watanabe-Akaike information criterion (WAIC) ...: 1864.94
## Effective number of parameters .................: 113.08
##
## Marginal log-Likelihood: -1416.25
## is computed
## Posterior summaries for the linear predictor and the fitted values are computed
## (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
prec_to_var(fit$marginals.hyperpar[[1]])
## mean sd quant0.025 quant0.25 quant0.5 quant0.75 quant0.975
## 1 0.6938514 0.1133778 0.5252033 0.6115126 0.6742112 0.7570979 0.9648019
prec_to_var(fit$marginals.hyperpar[[2]])
## mean sd quant0.025 quant0.25 quant0.5 quant0.75 quant0.975
## 1 0.04837744 0.01610103 0.02108057 0.03640378 0.04725156 0.05884403 0.08283049
##Compare to other GLM
dat.areal$logPop = log(dat.areal$Population)
summary (glm (Cases~offset (logPop) + Income_center, family = "poisson", data = dat.areal))
##
## Call:
## glm(formula = Cases ~ offset(logPop) + Income_center, family = "poisson",
## data = dat.areal)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -19.140 -7.805 -2.782 0.299 42.374
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.1873009 0.0043950 -1180.27 <2e-16 ***
## Income_center -0.0228930 0.0003785 -60.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 21775 on 225 degrees of freedom
## Residual deviance: 17893 on 224 degrees of freedom
## AIC: 19376
##
## Number of Fisher Scoring iterations: 4
summary (glm (Cases~offset (logPop) + Income_center, family = "quasipoisson", data = dat.areal))
##
## Call:
## glm(formula = Cases ~ offset(logPop) + Income_center, family = "quasipoisson",
## data = dat.areal)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -19.140 -7.805 -2.782 0.299 42.374
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.187301 0.038599 -134.390 < 2e-16 ***
## Income_center -0.022893 0.003324 -6.888 5.66e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 77.13112)
##
## Null deviance: 21775 on 225 degrees of freedom
## Residual deviance: 17893 on 224 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
#Examine random effects from the model
dat.areal$gamma_est = fit$summary.random[[1]]$mean
dat.areal$gamma_se = fit$summary.random[[1]]$sd
dat.areal$theta_est = fit$summary.random[[2]]$mean
dat.areal$theta_se = fit$summary.random[[2]]$sd
ggplot () + geom_sf (data = dat.areal, aes (fill = gamma_est))+
scale_fill_gradient2(low = "white",high = "red", name = "Gamma")+ggtitle("Spatial Random Effect Estimates")
ggplot () + geom_sf (data = dat.areal, aes (fill = theta_est))+
scale_fill_gradient2(low = "white",high = "red", name = "Theta")+ggtitle("Independent Random Effect Estimates")